An Intersection Inequality Sharper than the Tanimoto Triangle Inequality for Efficiently Searching Large Databases

نویسندگان

  • Pierre Baldi
  • Daniel S. Hirschberg
چکیده

Bounds on distances or similarity measures can be useful to help search large databases efficiently. Here we consider the case of large databases of small molecules represented by molecular fingerprint vectors with the Tanimoto similarity measure. We derive a new intersection inequality which provides a bound on the Tanimoto similarity between two fingerprint vectors and show that this bound is considerably sharper than the bound associated with the triangle inequality of the Tanimoto distance. The inequality can be applied to other intersection-based similarity measures. We introduce a new integer representation which relies on partitioning the fingerprint components, for instance by taking components modulo some integer M and reporting the total number of 1-bits falling in each partition. We show how the intersection inequality can be generalized immediately to these integer representations and used to search large databases of binary fingerprint vectors efficiently.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Structure of Bhattacharyya Matrix in Natural Exponential Family and Its Role in Approximating the Variance of a Statistics

In most situations the best estimator of a function of the parameter exists, but sometimes it has a complex form and we cannot compute its variance explicitly. Therefore, a lower bound for the variance of an estimator is one of the fundamentals in the estimation theory, because it gives us an idea about the accuracy of an estimator. It is well-known in statistical inference that the Cram&eac...

متن کامل

Results on Generalization of Burch’s Inequality and the Depth of Rees Algebra and Associated Graded Rings of an Ideal with Respect to a Cohen-Macaulay Module

Let  be a local Cohen-Macaulay ring with infinite residue field,  an Cohen - Macaulay module and  an ideal of  Consider  and , respectively, the Rees Algebra and associated graded ring of , and denote by  the analytic spread of  Burch’s inequality says that  and equality holds if  is Cohen-Macaulay. Thus, in that case one can compute the depth of associated graded ring of  as  In this paper we ...

متن کامل

On the metric triangle inequality

A non-contradictible axiomatic theory is constructed under the local reversibility of the metric triangle inequality. The obtained notion includes the metric spaces as particular cases and the generated metric topology is T$_{1}$-separated and generally, non-Hausdorff.

متن کامل

Volume difference inequalities for the projection and intersection bodies

In this paper, we introduce a new concept of volumes difference function of the projection and intersection bodies. Following this, we establish the Minkowski and Brunn-Minkowski inequalities for volumes difference function of the projection and intersection bodies.

متن کامل

Using the Triangle Inequality to Reduce the Number of Comparisons Required for Similarity-Based Retrieval

Dissimilarity measures, the basis of similarity-based retrieval, can be viewed as a distance and a similarity-based search as a nearest neighbor search. Though there has been extensive research on data structures and search methods to support nearest-neighbor searching, these indexing and dimension-reduction methods are generally not applicable to non-coordinate data and non-Euclidean distance ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of chemical information and modeling

دوره 49 8  شماره 

صفحات  -

تاریخ انتشار 2009